Search Result

Select

User opinion extraction based on adaptive crowd labeling with cost constrain

ZHAO Wei, LIN Yuming, HUANG Taoyi, LI You

Journal of Computer Applications 2019, 39 (5): 1351-1356. DOI: 10.11772/j.issn.1001-9081.2018112496

Abstract （435）

PDF （1034KB）（334）

Save

User reviews contain a wealth of user opinion information which has great reference value to potential customers and merchants. Opinion targets and opinion words are core objects of user reviews, so the automatic extraction of them is a key work for user review intelligent applications. At present, the problem is solved mainly by supervised extraction method, which depends on high quality labeled samples to train the model. And traditional manual labeling method is time-consuming, laborious and costly. Crowdsourcing calculation provides an effective way to build a high-quality training sample set. However, the quality of the labeling results is uneven due to some factors such as knowledge background of the workers. To obtain high-quality labeling samples at a limited cost, an adaptive crowdsourcing labeling method based on professional level evaluation of workers was proposed to construct a reliable dataset of opinion target-opinion words. Firstly, high professional level workers were digged out with small cost. And then, a task distribution mechanism based on worker reliability was designed. Finally, an effective fusion algorithm for labeling results was designed by using the dependency relationship between opinion targets and opinion words, and the final reliable results were generated by integrating the labeling results of different workers. A series of experiments on real datasets show that the reliability of high quality opinion target-opinion word dataset built by the proposed method can be improved by about 10%, compared with GLAD (Generative model of Labels, Abilities, and Difficulties) model and MV (Majority Vote) method when the cost budget is low.

Reference | Related Articles | Metrics

Select

Balanced clustering based on simulated annealing and greedy strategy

TANG Haibo, LIN Yuming, LI You, CAI Guoyong

Journal of Computer Applications 2018, 38 (11): 3132-3138. DOI: 10.11772/j.issn.1001-9081.2018041338

Abstract （524）

PDF （1065KB）（469）

Save

Concerning the problem that clustering results are usually required to be balanced in practical applications, a Balanced Clustering algorithm based on Simulated annealing and Greedy strategy (BCSG) was proposed. The algorithm includes two steps:Simulated Annealing Clustering Initialization (SACI) and Balanced Clustering based on Greedy Strategy (BCGS) to improve clustering effectiveness with less time cost. First of all, K suitable data points of data set were located based on simulated annealing as the initial point of balanced clustering, and the nearest data points to each center point were added into the cluster where it belongs in stages greedily until the cluster size reach the upper limit. A series of experiments carried on six UCI real datasets and two public image datasets show that the balance degree can be increased by more than 50 percentage points compared with Fuzzy C-Means when the number of clusters is large, and the accuracy of clustering result is increased by 8 percentage points compared with Balanced K-Means and BCLS (Balanced Clustering with Least Square regression) which have good balanced clustering performance. Meanwhile, the time complexity of the BCSG is also lower, the running time is decreased by nearly 40 percentage points on large datasets compared with Balanced K-Means. BCSG has better clustering effectiveness with less time cost than other balanced clustering algorithms.

Reference | Related Articles | Metrics

Select

One projection subspace pursuit for signal reconstruction in compressed sensing

LIU Xiaoqing LI Youming LI Chengcheng JI Biao CHEN Bin ZHOU Ting

Journal of Computer Applications 2014, 34 (9): 2514-2517. DOI: 10.11772/j.issn.1001-9081.2014.09.2514

Abstract （248）

PDF （606KB）（442）

Save

In order to reduce the complexity of signal reconstruction algorithm, and reconstruct the signal with unknown sparsity, a new algorithm named One Projection Subspace Pursuit (OPSP) was proposed. Firstly, the upper and lower bounds of the signal's sparsity were determined based on the restricted isometry property, and the signal's sparsity was set as their integer middle value. Secondly, under the frame of Subspace Pursuit (SP), the projection of the observation onto the support set in each iteration process was removed to decrease the computational complexity of the algorithm. Furthermore, the whole signal's reconstruction rate was used as the index of reconstruction performance. The simulation results show that the proposed algorithm can reconstruct the signals of unknown sparsity with less time and higher reconstruction rate compared with the traditional SP algorithm, and it is effective for signal reconstruction.

Reference | Related Articles | Metrics

Select

Research of software design methods in data dissemination

MA Wei-dong,LI You-ping

Journal of Computer Applications 2005, 25 (04): 913-914. DOI: 10.3724/SP.J.1087.2005.0913

Abstract （1251）

PDF （141KB）（1119）

Save

Data dissemination is a novel active service architecture based on the IP multicasting or IP broadcasting technology. It can be widely applied in LAN,WAN and digital broadcasting network,and transmits the hot-information to massive consumers. The services and transmission characteristic of data dissemination was concluded,re-transmission ratio and computing methods which can improve the errors of communication channels was submited.The sending and receiving algorithms and implementation codes for data dissemination were discussed.